AITopics | optimality operator

Collaborating Authors

optimality operator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Value Iteration with Anchoring

Neural Information Processing SystemsFeb-16-2026, 10:16:45 GMT

In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Accelerating Value Iteration with Anchoring Jongmin Lee 1 Ernest K. Ryu

Neural Information Processing SystemsFeb-16-2026, 10:16:42 GMT

In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Accelerating Value Iteration with Anchoring

Neural Information Processing SystemsOct-9-2025, 04:08:18 GMT

In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators.

inequality, operator, optimality operator, (14 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Reviews: A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Neural Information Processing SystemsJan-23-2025, 12:28:40 GMT

Summary: The paper proposes a new algorithm for learning linear preferences, which are objectives derived from a linear weighting of a vector reward function, in multi-objective reinforcement learning (MORL). The proposed algorithm achieves this by performing updates that use the convex envelope of the solution frontier to update the parameters of the action-value function, hence its name: envelop Q-learning. This is done by first defining a multi-objective version of the action-value function along with a pseudo-metric, the supremum over the states, actions, and preferences. Then, a Bellman operator is defined for the multi-objective action-value function along with an optimality filter, which together define a new optimality operator. Using all these definitions, the paper then shows three main theoretical results: 1) the optimality operator has a fixed point that maximizes the amount of reward under any given preference, 2) the optimality operator is a contraction, and 3) for any Q in the pseudo-metric space, iterative applications of the optimality operator will result in an action-value function which distance to the fixed point is equal to zero, i.e. is equivalent to the fixed point.

action-value function, figure 2, operator, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Accelerating Value Iteration with Anchoring

Lee, Jongmin, Ryu, Ernest K.

arXiv.org Artificial IntelligenceOct-28-2023

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a $\mathcal{O}(\gamma^k)$-rate, where $\gamma$ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a $\mathcal{O}(1/k)$-rate for $\gamma\approx 1$ or even $\gamma=1$, while standard VI has rate $\mathcal{O}(1)$ for $\gamma\ge 1-1/k$, where $k$ is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of $4$, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss--Seidel VI setups as well.

inequality, operator, optimality operator, (14 more...)

arXiv.org Artificial Intelligence

2305.16569

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Basaklar, Toygun, Gumussoy, Suat, Ogras, Umit Y.

arXiv.org Artificial IntelligenceMay-29-2023

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches. The main objective in a standard RL setting is to obtain a policy that maximizes a single cumulative reward by interacting with the environment. However, many real-world problems involve multiple, possibly conflicting, objectives. For example, robotics tasks should maximize speed while minimizing energy consumption. In contrast to single-objective environments, performance is measured using multiple objectives. Consequently, there are multiple Pareto-optimal solutions as a function of the preference between objectives (Navon et al., 2020). Multi-objective reinforcement learning (MORL) approaches (Hayes et al., 2022) have emerged to tackle these problems by maximizing a vector of rewards depending on the preferences. Existing approaches for multi-objective optimization generally transform the multidimensional objective space into a single dimension by statically assigning weights (preferences) to each objective (Liu et al., 2014).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2208.07914

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry:

Information Technology (0.45)
Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Objective Influence Diagrams with Possibly Optimal Policies

Marinescu, Radu (IBM, Dublin) | Razak, Abdul (University College Cork) | Wilson, Nic (University College Cork)

AAAI ConferencesFeb-14-2017

The formalism of multi-objective influence diagrams has recently been developed for modeling and solving sequential decision problems under uncertainty and multiple objectives. Since utility values representing the decision maker's preferences are only partially ordered (e.g., by the Pareto order) we no longer have a unique maximal value of expected utility, but a set of them. Computing the set of maximal values of expected utility and the corresponding policies can be computationally very challenging. In this paper, we consider alternative notions of optimality, one of the most important one being the notion of possibly optimal, namely optimal in at least one scenario compatible with the inter-objective tradeoffs. We develop a variable elimination algorithm for computing the set of possibly optimal expected utility values, prove formally its correctness, and compare variants of the algorithm experimentally.

artificial intelligence, bayesian inference, machine learning, (14 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Europe (0.28)

Technology: